Genome Characterisation of the CGMMV Virus Population in Australia—Informing Plant Biosecurity Policy

The detection of cucumber green mottle mosaic (CGMMV) in the Northern Territory (NT), Australia, in 2014 led to the introduction of strict quarantine measures for the importation of cucurbit seeds by the Australian federal government. Further detections in Queensland, Western Australia (WA), New South Wales and South Australia occurred in the period 2015–2020. To explore the diversity of the current Australian CGMMV population, 35 new coding sequence complete genomes for CGMMV isolates from Australian incursions and surveys were prepared for this study. In conjunction with published genomes from the NT and WA, sequence, phylogenetic, and genetic variation and variant analyses were performed, and the data were compared with those for international CGMMV isolates. Based on these analyses, it can be inferred that the Australian CGMMV population resulted from a single virus source via multiple introductions.


Introduction
Cucumber green mottle mosaic virus (CGMMV) is a species of the family Virgaviridae and genus Tobamovirus [1]. It is a positive-sense, single-stranded, rod-shaped virus with a monopartite genome of 6.4 kb [2]. Four open reading frames (ORFs) encode proteins of 186 K, 129 K, 29 K (movement protein) and 17.3 K (coat protein) [3]. First described in 1935 in England in Cucumis sativus (cucumber) [4], CGMMV infects plants in the Cucurbitaceae family as well as a number of weeds and wild plants from the Amaranthaceae, Apiaceae, Boraginaceae, Chenopodiaceae, Lamiaceae, Portulacaceae and Solanaceae families [5][6][7]. Mechanical and seed transmission are the primary modes of spread [8]. It has also been reported that pollinators such as Apis mellifera (European honeybee) can spread the virus while foraging [9].
CGMMV worldwide distribution includes Africa, Asia, Australia, Europe, the Middle East and North America, with the majority of detections having occurred between 1986 and 2016 [8,10]. The first Australian detection of CGMMV occurred in September 2014, when mosaic and mottle symptoms were observed on commercial Citrullus lanatus (watermelon) crops near Katherine and Darwin, Northern Territory (NT) [11]. Delimiting surveys reported an additional 26 NT locations with virus-infected cucurbit crops or weeds. By March 2015, eradication was considered technically unfeasible and management practices were instigated. In subsequent years, outbreaks have occurred in Queensland (QLD), Western Australia (WA), South Australia (SA) and New South Wales (NSW), with eradication or management plans being implemented in production areas [12][13][14].

Virus Isolates
Thirty-nine CGMMV isolates were used to generate genomes for this study (Table 1). Two C. sativus isolates collected during outbreaks of CGMMV in Queensland were provided as freeze-dried material by the Department of Agriculture and Fisheries, Brisbane, Queensland, Australia. Twenty hive-collected pollen samples and RNA extracts of C. lanatus, C. lanatus var. lanatus, Eleusine indica and Solanum nigrum were provided by the Department of Industry, Tourism and Trade (DITT), Northern Territory. Crop Health Services (CHS), Agriculture Victoria Research (AVR), Victoria, provided RNA from four C. sativus leaf and fruit samples collected from properties on the Northern Adelaide Plains, South Australia, affected during an outbreak in 2019 and subsequent surveillance in 2020. Four seed interception isolates were sourced from the Australian diagnostic laboratories, EMAI (NSWDPI) and CHS (AVR). Raw sequence data were provided for the original Northern Territory and Queensland detections in C. lanatus and for a C. sativus isolate collected in the Western Sydney cropping region and a C. lanatus isolate from the Sunraysia vegetable growing region, both collected during surveillance activities.  M  DWP1  NT-2019-07  Northern Territory  2019  Pollen  X1  DWP2  NT-2019-08  Northern Territory  2019  Pollen  X1  DWP-A  NT-2019-09  Northern Territory  2019  Pollen  X2  RDP1  NT-2019-10  Northern Territory  2019  Pollen  X1  DW20P1  NT-2020-01  Northern Territory  2020  Pollen  X1  DW20P3  NT-2020-02  Northern Territory  2020  Pollen  X1  RH20P2  NT-2020-03  Northern Territory  2020  Pollen  X1  RH20P3  NT-2020-04  Northern Territory  2020  Pollen  X1  RHRSP3  NT-2020-05  Northern Territory  2020  Pollen  X1  Q6393 QLD The "Sample name" is the provider identifier and the "Isolate label" is used for sequence and phylogenetic analysis. Collection location includes seed isolates intercepted at the Australian border. Pollen isolates were sourced from hive-collected pollen collected from mixed plant species. Data source denotes whether the isolate was sequenced using the Illumina NovaSeq (X1) or MiSeq (X2) platforms, CGMMV tiled-amplicon multiplex PCR and ONT MinION sequencing (M), or whether raw HTS data were provided (R).

RNA Extraction and RT-PCR Amplification
Total RNA was extracted from each pollen sample using the RNeasy Plant Mini Kit (QIAGEN, Doncaster, VIC, Australia). Starting with 0.05 g of hive-collected pollen, samples were homogenised using approximately 600 µL of 3 mm solid-glass beads (Merck Pty. Ltd., Bayswater, VIC, Australia) and 600 µL QIAGEN RLT buffer. After homogenisation, 300 µL RLT buffer and 10 µL ß-mercaptoethanol were added to the homogenate, and the extraction was completed according to the manufacturer's instructions. Leaf and fruit samples that had been submitted to Agriculture Victoria's Crop Health Services diagnostics laboratory and the NSW isolates collected during surveillance were extracted using the RNeasy Plant Mini Kit with a modified lysis buffer [18]. RNA of isolates that were provided by Northern Territory colleagues and of the original NT isolate were extracted using the Isolate II RNA Plant Kit (Bioline (Aust) Pty Ltd., Eveleigh, NSW, Australia) as per the manufacturer's instructions. The QLD isolate Q6393 was prepared using the Qiagen BioSprint plant DNA Kit (QIAGEN, Doncaster, VIC, Australia) as per the manufacturer's instructions, omitting the RNase A from the RLT extraction buffer.
Individual samples of imported seed lots, each sample comprising 100 seeds, were crushed and homogenised in a 5 × phosphate-buffered saline containing 0.25% (v/v) Tween ® 20 and 2% (w/v) polyvinylpyrrolidone 40,000, which was added at a rate of 9 mL of buffer to 1 g of seed. RNA was extracted according to the manufacturer's instructions using a 100 µL aliquot of homogenate added to 450 µL RLT buffer (RNeasy Plant Mini Kit, QIAGEN, Doncaster, VIC, Australia).
RT-PCR and RT-qPCR tests were conducted using the GoTaq Probe 1-step RT-qPCR System (Promega Corporation, Alexandria, NSW, Australia) according to the manufacturer's instructions. Extracts were tested for the presence of CGMMV using primers targeting the coat protein [19], the RNase helicase subunit [20] and the movement protein [21] ( Table 2). The PCR products were analysed by electrophoresis in 1.5% agarose gel stained with SYBR™ Safe DNA gel stain (Thermo Fisher Scientific, Scoresby, VIC, Australia). Fragment sizes were determined by comparison against the Invitrogen™ 1 Kb plus DNA ladder (Thermo Fisher Scientific, Scoresby, VIC, Australia).

Metagenomic Sequencing and Bioinformatics
Sequencing libraries for 25 isolates were prepared using an Illumina ® TruSeq ® Stranded Total RNA with Ribo-Zero Plant preparation kit, as described previously [22]. Libraries were sequenced using the Illumina MiSeq with a paired read length of 2 × 250 bp or the Illumina NovaSeq with a paired read length of 2 × 150 bp (see Table 1).
Raw sequence reads generated in this study and those supplied by collaborators were quality-filtered (quality score ≥ 20, minimum read length: 50), adapter sequences were trimmed and read pairs were validated using Trim Galore! [23]. Read pairs were merged using fastp (Version 0.20.0) [24]. De novo assembly was performed with SPAdes (version 3.15.2) [25] using options-rnaviral and -k 127,107,87,67,31 [26]. Assembled contigs of 1000 nt or more were analysed using BLASTn (version 2.9.0) [27]. Trimmed reads were used to map to the CGMMV reference genome (GenBank accession NC_001801.1) and assemble virus contigs of interest using BBMap (version 38.87) [28] with default settings. BCFtools (version 1.12) was used to call consensus sequences from mapping alignments. Final consensus sequences were created and annotated in Geneious Prime (version 2022.2.1) from mapping and contig consensus sequences.

Tiled Amplicon Multiplex PCR and MinION Sequencing and Bioinformatics
Targeted whole genome sequencing (TWG-seq) using a tiled amplicon sequencing and an Oxford Nanopore (ONT) MinION device was used to generate genome sequences for nine CGMMV isolates from pollen, as described previously [22].

Sequence Analysis and Recombination Detection
The CGMMV genome sequences of the 35 Australian isolates from plants, the 4 seed interception isolates and the 9 Australian CGMMV genomes downloaded from GenBank (Table 3) were used to generate a multiple sequence alignment. The 5 -and 3 -untranslated regions (UTRs) were removed, and the full-length coding sequence regions were aligned using MUSCLE (version 3.8.1551) [29] implemented in MEGA-X [30]. Identical sequences were not included in further analysis (see Table 1). Amino acid alignments were produced for the protein coding regions 129 K and 186 K, the movement protein (MP) and the coat protein (CP). To compare the CGMMV genomes from Australian plants and intercepted seed isolates from this study with genomes of global isolates, all publicly available CGMMV genomes were retrieved from GenBank (Table S1). Sequences with degenerate or unknown bases were discarded. The untranslated regions were removed from the sequences, and a total of 171 CGMMV isolates, including 132 publicly available sequences from GenBank, were aligned using MUSCLE (version 3.8.1551) [29] implemented in MEGA-X [30]. Sequence Demarcation Tool version 1.2 (SDT version 1.2) [31] was used to generate and visualise the pairwise nucleotide sequence identity matrix. Amino acid percentage similarity matrices were generated using Geneious Prime (version 2022.2.1) and scoring matrix Blosum62 (threshold 0). RDP4 (Recombination detection program version 4) [32] was used to detect recombination breakpoints in the alignment of all CGMMV full coding sequence genomes. Default settings were used with the seven detection methods: RDP [33], GENECONV [34], Bootscan [35], MaxChi [36], Chimaera [37], SiScan [38], PhylPro [39] and 3 Seq [40]. Recombination signals that were identified by RDP4.9 as potentially arising through evolutionary processes other than recombination were omitted. A Bonferroni-corrected p-value < 0.05 for four or more recombination detection methods was considered credible evidence of a recombination event.

Phylogenetic Analysis
Using the best DNA and protein model determined using MEGA-X [30] for each multiple sequence alignment, maximum-likelihood (ML) phylogenetic trees were constructed with 1000 bootstrap replicates, and the resulting trees were visualised in Interactive Tree of Life (iTOL) version 6.5.8 [41].
Bayesian inference was performed using BEAST version 1.10.4 [42]. Four replicated runs of 50 million Monte Carlo Markov chains (MCMCs) were carried out using the TN93 substitution model, the gamma and invariant sites heterogeneity model, tip dates (sampling dates), and a strict molecular clock model on the coding region nucleotide sequence alignment. The runs were assessed for convergence with Tracer (version 1.7.2) [43], and the first 25% of each run was discarded as burn-in. The resulting trees were merged using LogCombiner and summarised using TreeAnnotator, which are both part of the BEAST package [44]. The maximum-clade-credibility (mcc) tree was visualised using iTOL (version 6.5.8) [41].
Median-joining network analysis of variants was carried out using PopART 1.7 [45,46] for the coat protein and movement protein nucleotide sequences of all CGMMV isolates. Full-length coding sequences were analysed for clades containing Australian and seed interception isolates.

Genetic Variation
Analysis of genomic variation was carried out on the 171 CGMMV isolate data set for all coding sequence regions. DnaSP Ver. 6.12.03 [47] sliding window analyses were used to evaluate the number of polymorphic sites, the total number of mutations, the nucleotide diversity index Pi and the number of variants. Neutrality tests, Tajima's D indices, Fu and Li's D*-test statistics, and Fu and Li's F*-test statistics were also calculated. Statistical significance for each was evaluated under a null hypothesis.

RT-PCR and RT-qPCR
Conventional and real-time RT-PCR was performed on 37 RNA extracts from the Australian plant and seed interception isolates, noting that five isolates were provided as raw sequencing data (Table 1). CGMMV was detected from the RNA extracts using at least one PCR test: 32/37 extracts tested positive in the coat protein RT-PCR assay [19], 34/37 in the RNase helicase subunit RT-PCR assay [20] and 35/36 in the movement protein RT-qPCR assay [21] (Table S2). NT isolate 24,501 (NT-21014-03) was originally tested by collaborators, using the two conventional RT-PCR assays. The RNA provided was used to prepare the sequencing library.

Genome Sequences
The raw sequence reads obtained for the 42 samples that were sequenced in this study using MiSeq, NovaSeq, or MinION (Table 1) ranged from 4698 to 20,818,214 per sample (Table 1 and Table S3). After trimming, 4371-20,779,456 reads remained for de novo assembly and reference mapping. De novo assembly produced 1-138 virus contigs for the 33 CGMMV isolates sequenced using MiSeq and NovaSeq. Mapped reads for all sequencing methods ranged from 217 to 16,914,482 (Table S3). Consensus sequences were generated for 42 isolates from Australia (n = 38) and seed interceptions (n = 4) with lengths of 6342-6424 nt. Coding sequence regions were obtained for all isolates and used for further analysis. The consensus sequences generated in this study for 39 isolates have been deposited in GenBank (accession numbers OQ198372-OQ198410).
The recombination analysis of all the CGMMV genomes using RDP4 did not detect any recombinants.

Phylogenetic Analysis
Bayesian maximum-clade-credibility trees were constructed using the full coding region nucleotide sequences of the Australian, seed interception and global isolates sourced from GenBank. The mcc tree inferred for the 171 CGMMV isolates (Figure 3) produced two major clades with greater than 95% posterior probability support and is consistent with previous studies [8,16]. Subclades were defined in this study and were based on posterior probability values as follows. Clade 1 consists of two sub-clades with isolates from France, Latvia, the Netherlands, Russia and Spain in clade 1A and from Germany, Israel, Japan, Kuwait, the Netherlands and the USA in clade 1B. Clade 2 contains four subclades that have been designated 2A, 2B, 2C and 2D (Figure 3). Sub-clade 2A contains single isolates from Israel (KF155231.1) and Thailand (MH271423.1). Sub-clade 2B contains seven isolates from Canada and single isolates from China and Japan. Sub-clade 2C contains isolates originating from Australia, Bulgaria Canada, France, Greece, India, Israel, the Netherlands and the USA (Figure 3). Sub-clade 2D isolates all originated from Asia (China, Japan, Korea, South Korea, Taiwan and Thailand). All Australian isolates and three seed interception isolates are members of sub-clade 2C. The seed interception isolate SI_2018_01 aligns with Asian isolates in sub-clade 2D. Posterior probability support for clades and sub-clades is high (>95). Support for further clades in sub-clade 2C is low (<50).

Phylogenetic Analysis
Bayesian maximum-clade-credibility trees were constructed using the full coding region nucleotide sequences of the Australian, seed interception and global isolates sourced from GenBank. The mcc tree inferred for the 171 CGMMV isolates (Figure 3) produced two major clades with greater than 95% posterior probability support and is consistent with previous studies [8,16]. Subclades were defined in this study and were based on posterior probability values as follows. Clade 1 consists of two sub-clades with isolates from France, Latvia, the Netherlands, Russia and Spain in clade 1A and from Germany, Israel,  The mcc tree generated for the Australian and seed interception isolates (Figure 4) shows well-supported clades consisting of all Australian isolates and two seed The mcc tree generated for the Australian and seed interception isolates (Figure 4) shows well-supported clades consisting of all Australian isolates and two seed interception isolates, SI-2016-01 and SI-2016-02, clustering separately to seed isolates SI-2015-01 and SI-2018-01. The Australian clade separates into two clusters; however, posterior probability support for these clusters is low (<60). Cluster I contains all SA and WA genomes, together with two genomes from each of NT and QLD and one genome from NSW. Within Cluster 2, two seed interception genomes and the remaining Australian genomes group together. Seed isolates SI_2015_01 and SI_2018_01 form a separate cluster. interception isolates, SI-2016-01 and SI-2016-02, clustering separately to seed isolates SI-2015-01 and SI-2018-01. The Australian clade separates into two clusters; however, posterior probability support for these clusters is low (<60). Cluster I contains all SA and WA genomes, together with two genomes from each of NT and QLD and one genome from NSW. Within Cluster 2, two seed interception genomes and the remaining Australian genomes group together. Seed isolates SI_2015_01 and SI_2018_01 form a separate cluster. The full coding sequences of the CGMMV isolates clustering in sub-clades 2B and 2C of the mcc tree ( Figure 3) were used to generate a median-joining network ( Figure 5). All Australian isolates and seed interception isolates (GenBank accessions OQ198399, OQ198400 and OQ198410) are located in Clusters 1, 2 and 3.  The full coding sequences of the CGMMV isolates clustering in sub-clades 2B and 2C of the mcc tree ( Figure 3) were used to generate a median-joining network ( Figure 5). All Australian isolates and seed interception isolates (GenBank accessions OQ198399, OQ198400 and OQ198410) are located in Clusters 1, 2 and 3. Cluster 1 includes isolates MH427279.

Genomic Variation
DnaSP analysis was carried out on 171 CGMMV isolates for the full coding sequence region and the 129 kDa, 186 kDa, MP and CP regions (Table 4). Sequences were grouped and analysed by collection location. For the 44 Australian sequences, 137 of 6188 sites were found to be polymorphic, and the average number of nucleotide differences, k, was calculated to be 19.14905. Nucleotide diversity was low across all regions analysed. Negative Tajima's D values, indicative of low frequency variation within a population and implying population growth or positive selection, were observed for Asian, Australian and North American isolates across all coding sequence regions, and these were always statistically significant for the Asian isolates. Fu and Li's D-and F-test values, which also indicate positive selection and population growth, were always negative and statistically significant for Asian, Australian and North American isolates.
can isolates and 20 Australian, 2 European and 2 American isolates and 2 isolates of unknown origin, respectively. The remaining five variants, 58, 60-62 and 64, are exclusively Australian isolates. Thirteen of the 84 movement protein haplotypes include Australian isolates. Variant 42 includes five Australian, one European and ten North American isolates; variant 46 contains four Australian; one each from Europe, the Middle East and North America; and two of unknown origin.      The

Discussion
The detection and subsequent spread of CGMMV in Australia led to the implementation of emergency measures in 2014 to prevent the risk of further introductions via imported seeds associated with CGMMV. Prior to this, cucurbit seed was not screened for the virus, and questions have arisen regarding the number of introductions that could have occurred pre-2014. Based on the analysis presented here, the introduction of CGMMV into Australia is related to the reporting of CGMMV in North America [17,48,49] and new host reports in Israel [50] and Bulgaria [51], which occurred during the same period (2013-2015) [11]. These isolates are all located in sub-clade 2C ( Figure 3) and either share a haplo-group or are within 5 to 10 single nucleotide polymorphisms across the CGMMV genome of Australian variants. The high nucleotide identity between these isolates suggests a shared origin and signifies a global spreading event that is likely to have been facilitated by international seed movement.
Examination of the Australian population of CGMMV using sequence, phylogenetic and genetic variation analyses offers a few possible scenarios for introduction into the country. The first scenario is that there has been a single introduction and then spread via seed, seedling or other mechanical means to several locations and hosts in Australia. The second scenario is that there has been a minimum of two introductions of CGMMV into separate locations in Australia, which may have a common origin offshore. Support for the first scenario is present in the sequence, phylogenetic and genetic variation analyses of all Australian CGMMV isolates. Full coding sequence genomes are highly similar, with less than 0.5% difference in pairwise nucleotide identity and at most a 0.62% difference in amino acid similarity. Australian variants in the median-joining networks for the coat and movement proteins are separated by 1-2 single nucleotide variants at most. Examination of genetic diversity shows a high variant diversity (Vd) and low nucleotide diversity and reflects the abundance of singletons in the Australian genomes. The negative Tajima's D and Fu-Li values are statistically different from those expected from a neutral evolution model and are indicative of population expansion or positive selection [52][53][54]. The tip-dated mcc tree estimated using global and Australian CGMMV isolates shows strong posterior probability support for sub-clade 2C containing all Australian isolates, but this support is reduced (<50) at the next split in the tree (Figure 3).
Together, these results relating to the diversity of CGMMV in Australia suggest a rapid and recent population expansion that could have occurred following a single introduction.
The second proposed scenario considers the possibility of multiple introductions from the same or similar sources. Whilst sequence analysis of the Australian population does reveal a high level of nucleotide and amino acid similarity, examination of the full coding sequence median-joining network for sub-clades 2C and 2D ( Figure 5) shows a partitioning of Australian isolates. This division is also shown in the Australian and seed interception mcc tree (Figure 4), with Clusters 1 and 2 each representing a separate event; however, posterior probability support for this is low. The variant analysis for both the CP and MP indicates the presence of two, well-populated haplo-groups. CP variant 29 and MP variant 42 are both made up of sequences from Australia (n = 10 and n = 5), North America (n = 13 and n = 10) and Europe (one each), with common sequences across both haplo-groups. Other sequences from Australia, Europe, the Middle East and North America are all CP variant 41 and MP variant 46. However, the degree of separation between variants does not necessarily support different origins for each introduction. Within Cluster 1, it could also be suggested that a third introduction event may have occurred, represented by the two NT isolates in the upper clade of the tree (Figure 4). However, since these are two of the earliest isolates that were found, it could be that they represent an early variant from which the other Cluster 1 variants emerged.
It is possible that the larger clade within Cluster 1, containing isolates from NSW, QLD, SA and WA (Figure 4), emerged from a similar source of seed, rootstocks or nurseryproduced plants. The distribution of virus-contaminated seeds or cucurbit seedlings on infected rootstocks could explain the movement of CGMMV to diverse locations and growing conditions, as reported in an analysis of CGMMV in WA [55], and account for the range of hosts represented in Group 1. Rootstocks, such as C. moschata or the hybrid C.moschata × C. maxima, are commonly used for commercial crops of watermelons, melons and cucumbers [56]. Quality cucurbit rootstocks can enhance scion growth and provide resistance to soilborne diseases, such as Fusarium wilt, Verticillium wilt and gummy stem blight [56,57], and while there are benefits to production, there is increased risk of virus introduction via seeds, seedlings and the grafting process.
Alternatively, the stable and infectious nature of CGMMV virions offer a range of transmission pathways within propagation and production settings [8,58]. Establishment of CGMMV after introduction can lead to further spread by farm machinery, grafting tools, seed trays, clothes, shoes and hands [8,59,60]. Dispersal of CGMMV can continue with subsequent planting in soils contaminated with virus-positive plant debris [59,61] and irrigation systems, such as drip and flow irrigation [59]. Movement of virus-positive fresh produce and associated contaminated surfaces can introduce inoculum into post-harvest settings and, in turn, back into production areas. The spread of CGMMV to SA has been linked to infected properties in Geraldton, WA; however, the transmission pathway is unknown [13].
The Queensland isolate QLD-2015-01 (OQ198392) located in the upper clade of Cluster 2 (Figure 4) was detected as a result of seed trace-back following the NT outbreak in 2014 [62]. In this case, seedlings were generated in a QLD propagation facility using seed from the NT and subsequently cultivated on the Charters Towers property. The NSW cucumber isolate NSW-2019-01 (OQ198372) was collected from a glasshouse facility in Western Sydney [63]; however, the source of this outbreak is not clear-cut, with contaminated seeds, seedlings, shared cultivation equipment, soil or water all potential sources of inoculum.
The remaining Cluster 2 isolates resulted from pollinator movement of virus and infection of weeds either by pollen or contact with virus-infected crop debris. The detection of CGMMV in Apis mellifera collected from commercial beehives in June 2014 [64] indicates that the virus was present in the NT for a period prior to being detected on crops in September 2014. CGMMV genome sequences from pollen sampled from hives between 2017 and 2020 do not deviate significantly from the first introductions, and accumulated mutations may have arisen from adaptation to an increased host range and changes to environmental conditions [65]. It is worth noting that all pollen samples were collected from different hives across the four-year surveillance period, and it is unclear whether the sequenced virus has been hive-stored for a long period or recently collected by bees. If the latter is the case, then this could be indicative of the presence of the virus population in the bee foraging environment.
Previous studies have demonstrated the involvement of pollinators in the movement of CGMMV during foraging [9,66]. Furthermore, viable CGMMV was detected in adult bees, pollen and honey sampled from apiaries during the 2014 incursion response [67], and subsequent work investigating the role of honey bees in CGMMV epidemiology showed that honey bees can introduce the virus into healthy flowers, resulting in disease symptoms [68]. Viruses from infected plants or from positive beehives both produced symptoms in C. lanatus [68]. The range of non-crop species present in Cluster 2 also points to the role that pollinators and green bridges may play in the recurrence and spread of CGMMV infection in a region, highlighting the importance of weed control for disease management for growers. A range of weed species in the Cucurbitaceae, Euphorbiaceae, Lamiaceae and Solanaceae families are susceptible to CGMMV [7], and associated seeds have the potential to be a source of virus infection for future crops; however, further studies are needed to examine virus viability and infection rates across different weed hosts.
The analysis of genetic diversity, sequence similarity and phylogenetic relatedness presented in this study supports the presupposition of a common offshore source of CG-MMV present in Australia. Support for the multiple-introduction scenario is diminished by the phylogenetic analysis and the lack of support for multiple clades within the Australian population. Examination of the movement of seeds, seedlings, and pre-and post-harvest equipment and produce within Australia provides sufficient evidence for the modes of spread of CGMMV into cucurbit growing regions.
The global spread of CGMMV has been accelerated by the international movement of virus-infected seeds [8], and the outbreaks in Australia and California in 2013-2016 are associated with imported cucurbit seeds [15,17]. Unlike California, where the diversity of the CGMMV population was shown to be the result of multiple introductions [17], the low level of diversity in this study indicates that there was one introduction of CGMMV into Australia, which likely occurred around the time of a global spreading event. It is unlikely there have been further introductions since the introduction of emergency phytosanitary measures for cucurbit seeds associated with CGMMV in 2014, highlighting their effectiveness in mitigating risk to the Australian vegetable and melon industries. Between 2017 and 2022, CHS (AVR) detected CGMMV in 0.5-2.4% of seed lots tested, indicating that contaminated seed continues to circulate on the global market. The current biosecurity import controls on cucurbit seed continue to be a necessary and effective measure against the introduction of new variants, and the continuation of these regulations is vital to the Australian vegetable industries. Resourcing can focus on production and post-harvest systems to manage farm hygiene, cultivation and pollination practices, and weed control, while risk-management measures are in place at the Australian border. This genomic dataset generated for the current Australian CGMMV population provides a baseline for comparison with future detections in Australian vegetable-growing regions.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/v15030743/s1, Table S1: CGMMV genomes and associated metadata retrieved from GenBank; Table S2: RT-PCR and RT-qPCR results for new Australian CGMMV isolates; Table S3: High-throughput sequencing data. Table S4: Percent nucleotide and amino acid identities between MH427279.1 (Apis mellifera), the publicly available Australian genomes and the genomes assembled in this study; Table S5: SDT pairwise nucleotide similarity matrix; Table S6: Genetic variation, polymorphism and neutrality test statistic values of CGMMV coding sequence regions for the Australian CGMMV population. Funding: This project has been funded by Hort Innovation Australia Limited (VG16086) using vegetable industry levies with co-investment from the Queensland Department of Agriculture and Fisheries; the Victorian Department of Jobs, Precincts and Regions; the Northern Territory Department of Industry, Tourism and Trade; the Western Australia Department of Primary Industries and Regional Development; and the University of Tasmania. Hort Innovation is the grower-owned, not-for-profit research and development corporation for Australian horticulture. This research was conducted using the facilities of Agriculture Victoria.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.